Apache Nutch
Common Crawl – publicly available internet-wide crawls, started using Nutch in 2014. DiscoverEd – Open educational resources search prototype developed by Creative Commons Krugle uses Nutch to crawl web pages for code, archives and technically interesting content.